probabilistic  infoimation  which  must  be  canbined.  We  studied  a  simple 
cascaded  inference  task  in  which  one  individual  has  information  about 
the  diagnosticity  of  an  event  and  the  other  has  infoimation  about  the 
probability  that  the  event  has  occurred  (reliability  infoimation).  Our 
main  purpose  was  to  compare  two  people  working  together  with  an  indi¬ 
vidual  combining  both  types  of  infoimation,  and  to  assess  the  relative 
inpact  of  reliability  and  diagnosticity.  Both  nomothetic  and  idiographic 
analyses  of  the  responses  indicated  that  diagnosticity  has  a  greater 
impact  than  reliability  on  judged  responses.  This  effect  was  not  mediated 
by  subject  sex,  nor  by  vdiether  the  infoimation  was  integrated  by  an 
individual  working  alone  or  by  two  people,  each  given  either  diagnosticity 
or  reliability.  Naming  either  the  ’•reliability"  person  or  the  "diag¬ 
nosticity"  person  as  responsible  for  the  gro\;p  response  did  not  alter 
the  findings  appreciably.  Our  results  are  consistent  with  the  bulk  of 
the  subjective  inference/prediction  literature  that  st^ests  that  subjects 
over-eitphasize  the  impact  of  diagnostic  information  by  not  taking  full 
account  of  other  relevant  infoimation,  such  as  imperfect  correlation  in 
prediction  problems,  and  base- rate  infoimation  in  Bayesian  inference. 
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results;  these  can  be  fcxmd  in  the  self-contained  technical  reports.  In¬ 
stead,  we  briefly  review  the  studies  and  results,  focusing  on  interpre¬ 
tation.  Our  current  research  has  led  to  many  new  and  exciting  ideas,  and 
we  will  use  this  report  as  a  means  of  explicating  than. 

II.  Conqjuter  vs.  Analyst 

Increasingly,  applyers  of  decision  analysis  are  turning  to  ccmputers  to 
supplement  the  role  of  the  decision  analyst.  For  the  most  part,  the  state 
of  the  art  in  decision  software  is  at  a  level  of  data  storage,  display,  and 
computation  as  an  aid  to  a  sophisticated  user.  Undoubtably,  these  develop¬ 
ments  are  useful,  and  we  see  no  serious  intellectual  issue  to  be  raised  con¬ 
cerning  this  supplementary  role  of  computers  in  decision  analysis.  But  the 
next  generation  of  decision  software  will  surely  be  designed  to  perform  a 
larger  range  of  analyst  functions. 

Our  focus  is  on  identifying  potential  problems  challenging  the  comput¬ 
erization  of  decision  analysis,  aixi  assessing  the  extent  to  which  these 
problems  can  be  overcome.  TWo  problems  seem  particularly  salient  to  us. 

First,  much  of  what  goes  on  in  decision  analysis,  especially  during  structuring, 
is  more  accurately  described  as  "art'*  than  as  "science".  To  vhat  extent  can 
this  often  ill-defined  art  be  transformed  into  software?  Secondly,  past 
consimers  of  decision  analysis  have  expressed  satisfaction  with  both  the 
process  and  the  conclusions  of  analyses.  To  what  extent  is  this  satisfaction 
a  function  of  the  formal  methods  and  procedures  embodied  in  decision  theory 
and  the  technology  of  decision  analysis,  and  to  what  extent  do  other  factors 
such  as  personal  interaction  and  the  establishment  of  a  rapport  account  for 
client  approval? 

To  answer  these  questions,  John,  v.  Winterfeldt,  and  Edwards  (see 
Technical  Report  No.  81-1,  for  a  more  complete  description)  examined  the 
quality  and  user  acceptance  of  simple  MAU  analysis  performed  by  an  analyst 
vs.  a  stand  alone  computer  package,  called  MAUD  (for  Multi  Attribute  Utility 
Decomposition;  see  Humphreys  and  Wistidha,  1980;  Hunphreys  and  McFadden, 

1980) .  Unlike  other  packages  fran  DARPA  that  we  could  obtain  and  look  at , 

MAUD  "was  designed  to  work  in  direct  interaction  with  the  decision  maker, 
without  a  decision  analyst,  counselor,  or  other  'expert'  as  intermediary 
(Humphreys  and  McFadden,  1980)." 

Given  a  predetennined  set  of  alternatives  (at  least  4  and  not  more  than  8) , 
MAUD  guides  the  decision  maker  through  a  highly  structured  series  of 
interactions  resulting  in  aggregate  alternative  values  and  an  implied  ordering 


-2- 


TABLE  OF  CONTENTTS 


Acknowledgement. . 

Disclaimer  . 

Suimary . 

I .  Introduction  . 

II.  Computer  vs.  Analyst  . 

I I I.  Hierarchical  vs.  nonhierarchical  MAU  structures 

IV.  Group  structure,  datun  diagnostic! ty  and  source 

reliability  in  hierarchical  inference . 

References . . . 


1 


ACKNOWLEDGEMENT 


This  research  was  supported  by  the  Advanced  Research  Projects 
Agency  o£  the  Department  o£  Defense,  prime  contract  MDA  903-80-C-0194 
(ARPA),  and  siicontracted  from  Decisions  and  Designs,  Inc.,  #79-312-0731. 

The  authors  would  like  to  thank  Peggy  Giffin,  Greg  Griffin,  J.  Robert  Newman, 
and  William  Stillwell,  who  made  many  valuable  contributions  to  the  research 
svmmarized  in  this  report. 


DISCLAIMER 


The  views  and  conclusions  contained  in  this  docunent  are  those 
of  the  authors  and  should  not  be  interpreted  as  necessarily  representing 
the  official  policies,  either  expressed  or  iiT5)lied,  of  the  Advanced 
Research  Projects  Agency  or  the  United  States  Government. 


scale  (rather  than  9  points) ,  and  weights  were  obtained  by  direct 
ratio  estimation  o£  "importance”.  Most  sessions,  both  analyst  and 
MAUD,  las ted, between  1  and  2  hours. 

Although  subjects  overwhelmingly  yielded  more  favorable  reports  for 
the  analyst  session  than  for  the  MAUD  session,  subjects'  agreanent  with  and 
acceptance  of  the  analyst  and  MAUD  results  (ijiplied  ordering  and  most  pre¬ 
ferred  alternative)  did  not  differ.  Specifically,  subjects  indicated  a 
desire  to  use  the  analyst  rather  than  MAUD  in  future  decisions  and  con¬ 
fidence  that  the  analyst  rather  than  MAUD  "found"  the  best  alternative 
by  a  ratio  of  8  to  1.  Most  subjects  thought  that  the  analyst  session  was 
more  helpful  (5  to  1) ,  more  comfortable  (4  to  1) ,  and  more  effective  in 
discovering  new  aspects  of  the  problem  (3  to  1)  than  the  MAUD  session. 

Yet,  only  a  slim  majority  (54%)  chose  the  analyst  ratings  over  the  MAUD 
ratings.  Furthermore,  only  small  differences  were  found  between  analyst 
and  MAUD  ratings  with  respect  to  the  number  of  order  reversals  (Kendall 
Tau)  with  direct  holistic  ratings  by  the  subject.  Likewise,  MAUD  and 
analyst  ratings  did  not  differ  substantially  in  the  number  of  times  they 
matched  the  most  preferred  alternatives.  In  short,  although  subjects 
reported  liking  analyst  sessions  better  than  MAUD  sessions,  no  differences 
emerged  in  subjects'  acceptance  of  resulting  evaluations.  These  results 
were  not  substantially  mediated  by  session  order,  problem  type,  analyst, 
or  subject  sex  and  race. 

Two  analysts  tended  to  differ  systematically  in  subjects'  approval  of 
their  sessions  and  subjects'  acceptance  of  the  final  ratings,  as  well  as 
in  the  number  of  attributes  generated.  The  median  number  of  attributes 
elicited  was  greater  for  analyst  sessions  (7.5)  than  for  MAUD  sessions 
(5.9);  however,  one  analyst  averaged  10  attributes  per  session,  vdiile  another 
averaged  only  a  little  over  5.  The  10-attribute  analyst  was  rated  higher 
than  the  other  four  analysts  in  terms  of  subjects'  inpressions  of  the  sessions, 
but  received  the  lowest  amount  of  acceptance  of  the  resulting  alternative 
orderings.  The  five -attribute  analyst,  however,  received  the  lowest  sub¬ 
jective  ratings  of  all,  but  achieved  the  greatest  degree  of  acceptance  of 
final  alternative  orderings.  Our  findings  seem  to  indicate  that  subjects 
feel  better  taken  care  of  when  more  attributes  are  included  in  the  analysis, 
but  that  subjects'  holistic  ratings  are  better  accounted  for  by  analyses 
with  smaller  rather  than  larger  numbers  of  attributes.  Of  course,  it  is 
easy  to  overinterpret  these  results,  and  replication  is  certainly  required. 
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SUNWARY 


TTiis  report  stmmarizes  15  months  of  researdi  on  the  technology  of 
inference  and  decision,  incliiding  topics  sxich  as:  the  cpjality  and  user 
acceptance  of  decision  analysis  performed  by  computer  vs.  analyst;  the 
effect  of  hierarchical  vs.  nonhierarchical  structures  cti  MAU  importance 
weights  and  ratings;  and  the  relative  impacts  of  groqj  structure,  source 
reliability,  and  datim  diagnosticity  on  hierarchical  inference  judgments. 

The  purpose  of  this  report  is  to  suimarize  findings  and  ejqplain  how  they 
integrate  into  an  overall  program  of  research  on  decision  technology. 

For  the  most  part,  the  state  of  the  art  in  decision  software  is  at 
a  level  of  data  storage,  display,  and  conputation  as  an  aid  to  a  sophis¬ 
ticated  user.  Almost  certainly,  the  next  generation  of  decision  software 
will  be  designed  to  perform  a  larger  range  of  analyst  functions.  We 
have  focused  on  identifying  potential  problems  challenging  the  computer¬ 
ization  of  decision  analysis,  and  on  assessing  the  extent  to  which  these 
problems  can  be  overcome.  Two  questions  are  particularly  salient: 

First,  to  vdiat  extent  can  the  often  ill-defined  art  of  structuring  be 
transfomed  into  software;  and  secondly,  to  what  extent  is  past  consumers* 
satisfaction  with  decision  analysis  a  function  of  the  formal  methods  and 
procedures  of  the  theory  and  rationale  of  decision  theory,  and  to  what 
degree  do  other  factors  such  as  personal  interaction  and  the  establishment 
of  a  rapport  account  for  client  approval?  We  con^jared  multiattribute  utility 
analyses  of  personal  decision  problems  of  undergraduates  perfbmed  by  a 
hunan  analyst  vs.  those  perfoiroed  by  a  "stand-alone"  software  package. 

Multi  Attribute  Utility  Decomposition  (MAUD).  Althougji  siijects  over- 
vdielmingly  yielded  more  favorable  reports  for  the  analyst  session  than 
for  the  MAUD  session,  subjects'  agreement  with  and  acceptance  of  the 
analyst  and  MAUD  results  (implied  ordering  and  most  perferred  alternative) 
did  not  differ.  We  did  find  that  sx±>jects  feel  better  taken  care  of  when 
more  attributes  are  included  in  the  analysis,  but  that  siibjects*  holistic 
ratings  are  better  accounted  for  by  analyses  with  smaller  rather  than 
larger  numbers  of  attributes.  We  found  that  MAUD  is  not  as  "stand-alone" 
as  its  developers  have  advertised.  In  particular,  our  subjects  needed 
at  least  some  instruction  in  the  attribute  elicitation  phase  of  the  program. 


Overall,  our  subjects  became  quite  involved  in  both  MAUD  and  analyst 
sessions.  Subjective  ratings  of  both  sessions  were  greatly  skewed  toward 
the  high  end.  Subjects  were  highly  motivated,  and  their  responses  seaned 
more  thoughtful  and  considered  than  is  often  the  case  for  thought  experiments 
with  hypothetical  scenarios,  typical  of  laboratory  experiments  with  college 
sub j  ects . 

Unfortunately,  it  is  somewhat  difficult  to  interpret  subjects'  acceptance 
of  the  resulting  ordering  of  an  analysis  within  this  paradigm.  Low  acceptance 
could  mean  that  the  analysis  has  totally  gone  awry,  or  it  could  be  indicative 
of  a  deeper,  more  valid  evaluation  than  the  subject  is  capable  of  in  his/her 
holistic  ratings.  Regardless  of  the  interpretation,  it  is  important  that 
analyst  and  MAUD  orderings  did  not  differ  in  terms  of  subject  acceptance. 

Of  course,  our  findings  cannot  be  interpreted  in  a  vacuum.  Proper 
consideration  should  be  given  to  the  subject  population,  problem  types, 
analyst  experience  and  method  (SMART) ,  and  the  particular  MAUA  software  we 
employed  (MAUD) .  In  particular,  we  should  comment  on  the  peculiarities  of 
the  MAUD  program.  We  found  that  MAUD  is  not  as  "stand  alone"  as  its  developers 
have  advertised.  In  particular,  our  subjects  needed  at  least  some  instruction 
in  the  attribute  elicitation  phase  of  the  program.  Typical  mistakes  included: 
repetition  of  attributes  (up  to  15  times) ;  including  more  than  one  attribute 
in  a  given  attribute  definitioa;  and  thinking  about  other  attributes  vdien 
specifying  the  "ideal  point"  and/or  scale  values  on  an  attribute.  MAUD 
should  give  the  subject  more  information  concerning  attribute  elicitation, 
as  the  "difference  questions"  are  simply  too  abstract  and  nondirective. 

We  alsp  found  that  very  few  subjects  are  able  to  answer  the  brlts  weighting 
question  properly.  In  particular,  most  exhibit  a  sort  of  risk  aversion,  in 
vhich  they  only  eqioate  the  sure  thing  to  the  gamble  \dien  the  gamble  odds 
favor  the  more  favorable  outcome.  Of  course,  this  response  strategy  will  render 
the  weights  almost  totally  meaningless,  since  they  will  be  solely  dependent 
upon  the  order  (essentially  random  in  MAUD)  in  which  the  two  varying  attributes 
are  presented.  In  short,  the  risk  aversion  problem  with  brlts  may  result 
in  random  weights  in  MAUD.  To  circumvent  this,  we  intervened  at  the  point 
vihen  the  subject  begins  the  brlts  portion  of  the  program,  and  attarpted  to 
explain  the  brlts  question  in  terras  of  "importance"  of  the  two  varying 
attributes.  In  particular,  indifference  between  the  sure  thing  and  the 
gamble  for  odds  of  1:1  was  eqviated  to  attributes  of  equal  inportance.  Odds 
of  greater  and  less  than  1:1  were  also  explained  in  terms  of  the  relative 
importance  of  the  attributes. 


We  also  found  that  most  subjects  are  unable  to  answer  the  brlts  weighting 
question  properly;  uninstructed  responses  exhibit  a  sort  of  risk  aversion 
that  renders  the  weights  virtmlly  meaningless.  Overall,  siijects  were 
highly  motivated,  and  their  responses  seemed  more  thoughtful  and  con¬ 
sidered  than  is  often  the  case  for  thou^t  experiments  with  hypothetical 
scenarios,  typical  of  laboratory  experiments  with  college  siijects. 

Hierarchical  MAU  structuring  (and  weighting)  is  a  particularly 
attractive  approach  to  building  value  trees  when  the  nunber  of  attributes 
is  large;  partly  because  it  reduces  the  number  of  necessary  judgments, 
and  partly  beczause  it  avoids  weighting  questions  in  which  only  remotely 
related  attributes  need  to  be  con^ared.  But  there  are  several  potential 
problems:  for  exanple,  respcxidents  may  add  to  an  tq)per  level  value 
meaning  not  captured  by  its  Icwer  branches ;  ranges  of  alternatives ,  vhich 
can  often  be  made  explicit  in  specific  attributes,  become  more  vague  at 
higher  levels ,  peihaps  distortir^  range  dependent  importance  weights ; 
furtheimore,  it  is  not  clear  vhether  numbers  elicited  hierarchically  and 
non-hierardiically  are  ccmsistent — if  not,  which  should  we  trust  more? 

We  studied  subjects'  weighting  and  rating  judgments  for  both  hierarchical 
and  non-hierarchical  value  trees  relevant  to  evaluation  of  alternatives 
for  electricity  production.  Hierarchical  weights  were  found  to  be  more 
variable  than  non-hierarchical  weights ,  essentially  replicating  a  result 
reported  in  1973.  We  also  found  that  subjects  are  often  inconsistent 
in  their  attribute  ratings  acnross  different  levels  of  the  value  hierarchy. 
Random  error,  value  lability,  and  misunderstanding  of  higher  level  attri¬ 
butes  are  all  possible  explanations  for  this  resvilt.  Finally,  we  found 
that  subjects'  weight  sets  did  not  differ  to  any  great  extent  as  a  functicm 
of  their  preferred  alternatives;  rather,  location  measures  were  the  primary 
deteiminant  of  preference  orderings.  Policy  decision  making,  at  least  in 
some  highly  charged  arenas,  may  be  more  a  matter  of  one's  perceptirai  of  how 
well  each  strategy  will  accomplish  the  stated  goals,  and  not  one  sensitive 
to  the  tradeoffs  among  different  goals. 

Many  important  and  interesting  probabilistic  infoimaticjn  processing 
tasks  are  essentially  hierarchiczal  or  cascaded  in  formal  structure,  and 
involve  situations  in  which  different  people  have  different  types  of 
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It  is  important  to  note  that  all  interventions  into  the  MAUD  sessions 
(both  for  attribute  elicitation  and  brlts  questions)  were  kept  short  and 
detached  from  the  flow  of  the  interaction  between  MAUD  and  the  subject. 
Information  and  instructions  were  given  only  for  clarifying  those  points 
necessary  for  the  MAUD  assessments.  In  short,  all  extra-MAUD  interaction 
within  the  MAUD  session  was  kept  as  unobtrusive  as  possible. 

Results  concerning  quality  of  attribute  sets  (in  terms  of  con5)leteness , 
value  independence,  etc.)  are  included  in  the  full  technical  report. 

III.  Hierarchical  vs.  nonhierarchical 
MAU  structures 

Ccmiplex  evaluation  problems  can  usually  be  aided  by  the  construction 
of  a  value  tree  which  organizes  general  values,  intermediate  objectives,  and 
final  value  relevant  attributes  in  a  hierarchy.  MAU  models  can  then  be 
built  in  two  ways: 

1)  By  ignoring  the  hierarchical  structure  and  performing  the 
weighting  and  rating  tasks  on  lowest  attributes  (twigs)  only; 

2)  By  weighting  branches  at  each  level  of  the  tree  under  a  given 
node  and  computing  final  attribute  weights  by  multiplying  down 
the  tree. 

The  hierarchical  weighting  model  can,  furthermore,'  be  cot^led  with 
ratings  of  options  on  different  levels  of  the  tree  to  examine  the  internal 
consistency  of  MAU  models  and  judgments  at  various  levels  of  aggregation. 

Hierarchical  weighting  is  especially  attractive  when  the  number  of 
attributes  is  very  large;  partly  because  it  reduces  the  number  of  necessary 
judgments,  and  partly  because  it  avoids  weighting  questions  in  vdiich  only 
remotely  related  attributes  need  to  be  compared.  But  there  are  also  problems; 
for  example,  respondents  may  add  to  an  iqjper  level  value  meaning  not 
captured  by  its  lower  branches;  ranges  of  alternatives,  vhich  can  often 
be  made  explicit  in  specific  attributes,  become  more  vague  at  higher  levels, 
perhaps  distorting  range  dependent  importance  weights;  furtheimore,  it  is 
not  clear  whether  nunbers  elicited  hierarchically  and  nonhierarchically 
are  consistant  --if  not,  which  nunbers  should  we  trust  more? 

To  answer  some  of  these  questions  Stillwell,  v.  Winterfeldt,  and 
Edwards  (Technical  Report  No.  81-2)  performed  an  experiment  in  which  37 
undergraduate  students  evaluated  three  scenarios  for  electricity  production 
(coal  vs .  nuclear  vs .  geothermal  coupled  with  strict  conservation  measures) . 
MAU  ratings  and  weights  were  elicited  for  13  attributes,  which  a  previous 
study  had  found  relevant  for  this  evaluation. 


probabilistic  infoimation  which  must  be  canbined.  We  studied  a  simple 
cascaded  inference  task  in  which  one  individual  has  information  about 
the  diagnosticity  of  an  event  and  the  other  has  infoimation  about  the 
probability  that  the  event  has  occurred  (reliability  infoimation).  Our 
main  purpose  was  to  compare  two  people  working  together  with  an  indi¬ 
vidual  combining  both  types  of  infoimation,  and  to  assess  the  relative 
inpact  of  reliability  and  diagnosticity.  Both  nomothetic  and  idiographic 
analyses  of  the  responses  indicated  that  diagnosticity  has  a  greater 
impact  than  reliability  on  judged  responses.  This  effect  was  not  mediated 
by  subject  sex,  nor  by  vdiether  the  infoimation  was  integrated  by  an 
individual  working  alone  or  by  two  people,  each  given  either  diagnosticity 
or  reliability.  Naming  either  the  ’•reliability"  person  or  the  "diag¬ 
nosticity"  person  as  responsible  for  the  gro\;p  response  did  not  alter 
the  findings  appreciably.  Our  results  are  consistent  with  the  bulk  of 
the  subjective  inference/prediction  literature  that  st^ests  that  subjects 
over-eitphasize  the  impact  of  diagnostic  information  by  not  taking  full 
account  of  other  relevant  infoimation,  such  as  imperfect  correlation  in 
prediction  problems,  and  base- rate  infoimation  in  Bayesian  inference. 
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These  same  attributes  were  then  arranged  in  a  hierarchical  fashion,  and 
siibjects  judged  importance  on  each  of  three  levels  of  the  hierarchy. 

Upper  levels  are  simply  combinaticms  of  lower  level  attribute  sets.  Final 
weight  for  a  lower  level  attribute  (twig)  is  conputed  by  multiplying  weights 
that  include  that  twig  at  each  of  the  three  levels  of  the  hierarchy. 
Hierarchical  weights  were  found  to  be  more  variable  than  nonhierarchical 
weights;  this  finding  essentially  replicates  a  result  reported  by  Sayeki 
and  Vesper  in  1973. 

From  the  standpoint  of  structuring,  a  more  interesting  result  followed 
from  subjects'  attribute  ratings  of  alternatives  (location  measures)  at 
each  level  of  the  hierarchy.  Surprisingly,  subjects  are  often  inconsistent  in 
their  ratings.  For  example,  coal  may  be  rated  higher  than  nuclear  on  an 
upper  level  attribute,  vhile  nuclear  is  rated  higher  than  coal  on  all  of  the 
sub -attributes  making  up  that  attribute.  Three  explanations  seem  plausible. 
First,  subjects  may  siji5)ly  err  in  expressing  their  values.  A  second 
possibility  is  that  our  subjects'  values  are  extremely  labile;  between 
the  time  subjects  were  asked  to  rate  options  at  the  lower  and  higher  levels, 
their  values  changed.  The  third  explanation  is  that  subjects  imbue  higher 
level  attributes  with  a  richer  leaning  than  the  structure  below  that  attribute 
warrants,  creating  a  sort  of  "super  dimension"  of  undelineated  aspects. 

Further  research  is  warranted  on  this  topic,  as  a  case  can  be  made  for  all 
three  interpretations. 

Finally,  we  found  that  subjects'  weight  sets  did  not  differ  to  any 
great  extent  as  a  function  of  their  preferred  alternatives,  e.g.,  subjects 
favoring  the  nuclear  option  assigned  weights  which  were  similar  to  those 
preferring  coal.  All  groups  gave  the  highest  weight  to  the  health/safety/ 
environment  factor.  Instead  of  weights,  location  measures  were  the  primary 
determinant  of  preference  orderings  among  the  three  alternatives.  This 
suggests  that  preferences  are  not  determined  by  the  extent  to  vhich  one  is 
willing  to  trade  off  one  attribute  for  another,  as  has  of  ter  been  asserted. 
Contrarily,  it  appears  from  these  data  that  one's  perception  of  the  alter¬ 
natives'  standing  on  the  various  attributes  determines  preference  (or 
vice  versa).  Weights  seem  to  play  little  role.  Brickman,  Shaver,  and 
Archibald  (1969)  reported  much  the  same  finding  in  an  attitude  study  of 
various  foreign  policies  of  the  United  States  towards  the  Vietnam  conflict. 

In  a  study  on  attitudes  towards  nuclear  power,  Otway,  Maurer,  and  Thomas 


I .  IntroductiOTi 

This  final  report  svmimarizes  work  by  the  Social  Science  Research 
Institxite,  University  of  Southern  California  supported  by  the  Advanced 
Research  Projects  Agency  of  the  Department  of  Defense  under  prime  contract 
MDA903-80-C-0194,  under  subcontract  from  Decisions  and  Designs,  Inc.  The 
research  conducted  during  this  contract  period  from  November  1,  1979  to 
January  31,  1981,  under  the  direction  of  Professor  Ward  Edwards,  the 
Principle  Investigator,  grew  out  of  a  program  of  research  siq)ported  by 
ARPA  for  the  study  of  the  technology  of  inference  and  decision.  Edwards 
(1973,  1975j,  Edwards  and  Seaver  (1976),  Edwards,  John  and  Stillwell 
(1977,  1979),  and  Edwards  and  Stillwell  (1980)  summarize  previous  research. 

Our  past  research  was  concerned  with  the  sinplification  of  decision 
analysis,  and,  specifically,  with  developing  and  validating  sinple  tech¬ 
niques  for  multiattribute  utility  analysis  (MAUA) .  Both  our  laboratory 
and  real  world  validation  stvidies  demonstrated  that  very  simple  rating 
and  ranking  methods  perform  just  as  well  as  more  conplicated  techniques. 

These  validation  studies,  combined  with  earlier  sensitivity  analysis,  led 
us  to  conclude  that  within  a  given  structure  the  precise  methods  of 
eliciting  numbers  matter  little,  VarySig  problem  settings  and  structure 
could,  of  course,  have  strraig  effects  on  elicited  numbers  and  results  of 
the  analysis. 

The  research  surnnarized  in  this  proposal  examined  such  variations 
in  problem  setting  and  structure.  Specifically,  we  performed  three 
experiments  on  the  following  topics: 

(1)  the  quality  and  user  acceptance  of  decision  analysis 
performed  by  conputer  vs.  analyst; 

(2)  the  effect  of  hierarchical  vs.  nonhierarchical  structures 
on  MAU  importance  weights  and  ratings; 

(3)  the  relative  impacts  of  grotp  structure,  source  re¬ 
liability,  and  datum  diagnosticity  on  hierarchical 
probabilisitic  inference  judgments. 

Our  research  on  these  issues  and  problems  is  reported  in  three  technical 
reports  which  are  now  being  prepared. 

The  purpose  of  this  report  is  to  summarize  our  findings  and  to  explain 
how  they  integrate  into  an  overall  program  of  research  on  decision  tech¬ 
nology.  Thus,  we  do  not  report  detailed  descriptions  of  procedures  and 
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result  is  not  encouraging  for  those  who  view  attribute  ratings  of  al¬ 
ternatives  as  a  task  for  "technicians"  who  should  know  the  "facts",  and 
iji5)ortance  weighting  as  the  primary  function  of  the  decision  maker  charged 
with  the  value  laden  task  of  trading  off  one  deserving  goal  for  another. 

Policy  decision  making,  at  least  in  arenas  as  highly  charged  as  nuclear 
power  and  Vietnam,  may  be  more  a  matter  of  care's  perception  of  how  well 
each  strategy  will  acconplish  the  stated  goals,  and  not  one  sensitive 
to  the  tradeoffs  among  different  goals. 

This  result,  coiqjled  with  the  finding  that  attribute  ratings  of 
alternatives  are  highly  inconsistent,  is  disconcerting.  In  effect,  this 
study  suggests  that  the  structure  and  labels  of  attributes  may,  to  a  great 
extent,  determine  the  final  preference  ordering  of  the  analysis. 

TV.  Group  structure^  datum  diagnosticity, 
and  source  reliability  in  hierarchical  inference 

Many  inportant  and  interesting  probabilistic  information  processing 
tasks  are  essentially  hierarchical  or  cascaded  in  formal  structure,  and 
involve  sitioations  in  \diich  different  people  have  different  types  of  prob¬ 
abilistic  information  which  must  be  combined.  Griffin  and  Edwards  (for 
a  more  complete  description  see  Technical  Report  No.  81-3)  studied  a  simple 
cascaded  inference  task  in  vAiich  one  individual  has  information  about  the 
diagnosticity  of  an  event  and  the  other  has  information  about  the  probability 
that  the  event  has  occurred  (reliability  information) .  The  main  purpose  of 
this  experiment  was  to  compare  two  people  working  together  \dien  combining 
such  information  with  an  individual  with  both  types  of  information,  and  to 
assess  the  relative  impact  of  reliability  and  diagnosticity  in  both  situations. 

In  the  symmetric,  two  hypothesis  case,  reliability  and  diagnosticity 
should  be  equally  important  in  determining  the  aggregate  odds: 

L  =  (L^  +  1)/(L^  +  Lj)  ,  (1) 

idiere  L  is  the  aggregate  odds,  is  the  reliability  likelihood  ratio,  and 
is  the  diagnosticity  likelilwod  ratio.  Thus,  diagnostic  information  at 
odds  of  10  to  1,  that  has  only  a  2  to  1  chance  in  favor  of  being  true,  should 
be  equally  as  convincing  as  diagnostic  information  at  odds  of  2  to  1,  that 
has  a  10  to  1  chance  in  favor  of  being  true. 

IMdergraduates  (31  two-person  groups  and  10  individuals  working  alone)  were 
presented  with  a  scenario  in  vdiich  a  judgment  had  to  be  made  about  the 
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results;  these  can  be  fcxmd  in  the  self-contained  technical  reports.  In¬ 
stead,  we  briefly  review  the  studies  and  results,  focusing  on  interpre¬ 
tation.  Our  current  research  has  led  to  many  new  and  exciting  ideas,  and 
we  will  use  this  report  as  a  means  of  explicating  than. 

II.  Conqjuter  vs.  Analyst 

Increasingly,  applyers  of  decision  analysis  are  turning  to  ccmputers  to 
supplement  the  role  of  the  decision  analyst.  For  the  most  part,  the  state 
of  the  art  in  decision  software  is  at  a  level  of  data  storage,  display,  and 
computation  as  an  aid  to  a  sophisticated  user.  Undoubtably,  these  develop¬ 
ments  are  useful,  and  we  see  no  serious  intellectual  issue  to  be  raised  con¬ 
cerning  this  supplementary  role  of  computers  in  decision  analysis.  But  the 
next  generation  of  decision  software  will  surely  be  designed  to  perform  a 
larger  range  of  analyst  functions. 

Our  focus  is  on  identifying  potential  problems  challenging  the  comput¬ 
erization  of  decision  analysis,  aixi  assessing  the  extent  to  which  these 
problems  can  be  overcome.  TWo  problems  seem  particularly  salient  to  us. 

First,  much  of  what  goes  on  in  decision  analysis,  especially  during  structuring, 
is  more  accurately  described  as  "art'*  than  as  "science".  To  vhat  extent  can 
this  often  ill-defined  art  be  transformed  into  software?  Secondly,  past 
consimers  of  decision  analysis  have  expressed  satisfaction  with  both  the 
process  and  the  conclusions  of  analyses.  To  what  extent  is  this  satisfaction 
a  function  of  the  formal  methods  and  procedures  embodied  in  decision  theory 
and  the  technology  of  decision  analysis,  and  to  what  extent  do  other  factors 
such  as  personal  interaction  and  the  establishment  of  a  rapport  account  for 
client  approval? 

To  answer  these  questions,  John,  v.  Winterfeldt,  and  Edwards  (see 
Technical  Report  No.  81-1,  for  a  more  complete  description)  examined  the 
quality  and  user  acceptance  of  simple  MAU  analysis  performed  by  an  analyst 
vs.  a  stand  alone  computer  package,  called  MAUD  (for  Multi  Attribute  Utility 
Decomposition;  see  Humphreys  and  Wistidha,  1980;  Hunphreys  and  McFadden, 

1980) .  Unlike  other  packages  fran  DARPA  that  we  could  obtain  and  look  at , 

MAUD  "was  designed  to  work  in  direct  interaction  with  the  decision  maker, 
without  a  decision  analyst,  counselor,  or  other  'expert'  as  intermediary 
(Humphreys  and  McFadden,  1980)." 

Given  a  predetennined  set  of  alternatives  (at  least  4  and  not  more  than  8) , 
MAUD  guides  the  decision  maker  through  a  highly  structured  series  of 
interactions  resulting  in  aggregate  alternative  values  and  an  implied  ordering 
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likelihood  that  a  job  applicant  will  be  successful,  assuning  that  he  or 
she  is  hired.  The  diagnosticity  information  was  a  test  result  with 
known  validity;  the  reliability  information  was  the  odds  that  an 
unreliable  tester  actually  reported  the  true  versus  a  random  test  result. 
Siibjects  were  asked  to  give  odds  based  on  12  different  L^,  pairs;  they 
were  told  that  a  monetary  payoff  of  up  to  $5.00  would  be  given  contingent 
on  their  performance.  Both  nomothetic  and  idiographic  analyses  of  the 
responses  indicated  that  diagnosticity  has  a  greater  impact  than  reli¬ 
ability  on  judged  responses.  This  effect  was  not  mediated  by  subject 
sex,  nor  by  whether  the  information  was  integrated  by  an  individual  working 
alone  or  by  two  people,  each  given  either  diagnosticity  or  reliablity. 
Furthermore,  naming  either  the  "reliability"  person  or  the  "diagnosticity" 
person  as  "responsible"  for  the  group  response  did  not  alter  the  findings 
appreciably. 

Substantively,  this  experiment  has  replicated  a  robust  finding  in  the 
subjective  inference/prediction  literature.  Kahneman  and  Tversky  (1973, 

1979)  and  others  have  demonstrated  that  subjects  predicting  a  criterion 
score  (e.g.,  GPA)  from  a  predictor  score  (e.g.,  SAT  score)  tend  to  ignore 
the  fact  that  the  two  are  not  perfectly  correlated.  They  treat  all  pre¬ 
dictors  as  though  they  were  equally  valid,  avoiding  optimal  modification 
of  the  predictor  score  by  regressing  it  toward  the  mean.  In  effect,  subjects 
ignore  certain  information  about  the  worth  or  credibility  of  extreme 
diagnostic  information,  just  as  our  subjects  devalued  the  explicit  information 
about  diagnosticity  conveyed  in  the  reliability  probability. 

This  finding  is  also  consistent  with  the  so-called  "base-rate  fallacy" 
(Kahneman  and  Tversky,  1973;  Lyon  and  Slovic,  1976).  According  to  Bayes' 
Theoran,  a  prior  of  2  to  1  coupled  with  diagnostic  information  of  100  to  1 
should  be  as  cornielling  as  a  prior  of  100  to  1  and  diagnostic  information 
df  2  to  1.  Many  studies  have  shown  that  the  base-rate  information 
will  not  be  weighed  into  the  posterior  odds  judgments  to  the  appropriate 
extent  (see  Nisbett  and  Ross,  1980,  for  a  review).  In  all  three  of  these 
examples,  subjects  over -emphasized  the  impact  of  diagnostic  information  by 
not  taking  full  account  of  other  information:  imperfect  correlation  in  the 
prediction  problan,  base- rate  information  in  the  Bayesian  inference  problem, 
and  reliability  of  the  diagnositc  information  in  the  hierarchical  inference 
problem. 

It  is  important  to  note  the  critical  role  of  stimulus  presentation  on  these 
types  of  experiments.  The  present  study  conveyed  the  reliability  and  diag- 


of  the  alternative  set.  MAUD  developes  a  set  of  attributes  by  asking  for 
descriptions  of  how  the  various  alternatives  differ.  Single  attribute  value 
functions  are  elicited  by  placing  each  alternative  on  a  nine  point  rating 
scale,  determining  an  ideal  point,  and  normalizing  under  an  assumption  of 
piece-wise  linearity.  Finally,  importance  weights  are  assessed  under  an 
additivity  assumption  via  the  basic  reference  lottery  tickets  (brlts)  pro¬ 
cedure  (Keeney  and  Raiffa,  1976).  The  final  aggregate  values  resulting 
from  a  MAUD  analysis  are  a  hybrid  of  riskless  rating  scale  single  attribute 
values  and  risky  brlts  importance  weights . 

Thirty-five  undergraduates  of  sixty-seven  interviewed  (52%  selection 
ratio)  volunteered  to  undergo  MAUA  with  both  an  analyst  and  MAUD  for  a 
choice  dilemma  that  (1)  was  personally  important  and  relevant,  (2)  involved 
four  or  more  viable  alternatives,  and  (3)  required  information  that  was 
readily  accessible.  The  experiment  is  unique  in  that  the  multiattribute 
evaluation  problems  were  generated  by  the  subjects,  and  not  by  the  ex¬ 
perimenter  as  a  thought  experiment  on  a  hypothetical  scenario.  Problems 
included  choosing  among  majors  (11) ,  colleges  to  \diich  to  transfer  (9) , 
places  to  live  (6) ,  careers  (4) ,  travel  plans  (2) ,  automobiles  (1) ,  sports 
activities  (1) ,  and  strategies  for  handling  a  rocmmate  difficulty  (1) . 

After  problems  and  alternative  sets  were  discussed  .and  agreed  on  with 
the  experimenter  (not  an  analyst),  subjects  were  assigned  to  either  the 
MAUD-first  or  analyst-first  condition,  and  underwent  the  first  analysis. 

The  second  analysis,  either  MAUD  or  analyst,  came  approximately  one  week 
after  the  first.  Five  different  analysts  were  utilized,  including  two  re¬ 
search  faculty,  one  seventh  year  graduate  student,  and  two  first  year  graduate 
students.  All  subject- analyst  assignments  were  made  at  the  convenience 
of  both  parties.  None  of  the  analysts  had  more  than  cursory  experience  with 
applying  MAUA  for  personal  decision  problems,  and  the  two  first- year  students 
learned  of  MAU  ideas  only  a  few  weeks  before  their  involvement  in  the  study. 

Although  details  of  procedure  may  have  varied  across  analysts,  all  iised 
a  version  of  MAUA  much  like  the  siiiqile  multiattribute  rating  technique 
(SMART;  Edwards,  1977).  This  differed  from  the  MAUD  analyses  in  that  no 
procedure,  however  vague,  was  specified  for  obtaining  attributes.  Analysts 
used  one  or  more  of  the  following  methods:  (1)  suggestion,  (2)  MAUD- like 
difference  question,  (3)  asking  "How  is  this  particular  alternative  attr¬ 
active?",  (4)  requiring  the  subject  to  find  one  aspect  on  which  each  alter¬ 
native  is  attractive,  (5)  asking  "What  dimens ions /attributes  do  you  want  to 
consider?",  and  (6)  asking  "What  factors  are  relevant  to  the  decision?" 

Also,  single  dimension  values  were  assigned  via  a  100  point  rating 
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nosticity  information  in  the  fom  of  contingency  tables  with  appropriate 
summary  statistics.  (The  cover  story  involved  an  unreliable  report  of 
pass/fail  on  a  test  that  was  somewhat  diagnostic  of  success/failure  in 
a  job  situation.)  Past  research  on  the  base-rate  fallacy  has  shown  that 
priors  will  be  utilized  more  vdien  they  are  either  causal  to  the  event 
hypotheses  (Bar-Hillel,  1980;  Tversky  and  Kahneman,  1980)  or  imparted 
to  the  subject  in  a  concrete,  trial  by  trial  manner  (Manis,  Dovalina, 

Avis,  and  Cordoze,  1980)  (See  also  Kassin,  1979).  We  can  only  speculate 
that  such  manipulations  would  have  had  similar  affects  on  the  utilization  of 
reliability  information  in  our  experiment.  Although  contingency  tables 
are  more  concrete  than  a  single  summary  number,  they  are  still  relatively 
abstract  (c.f.,  Manis,  et  al.,  1980). 

Methodologically,  this  experiment  points  out  two  important  contrasts  in 
the  way  judgment  and  decision  researchers  view  their  data:  (1)  nomothetic 
vs.  idiographic  analysis  and  (2)  no  effect  vs. optimal  model  null  hypothesis 
testing.  (See  Einhom  and  Hogarth,  1981).  Nomothetic  analyses  (exploring 
response  patterns  of  group  mean  responses)  are  required  when  a  "between 
subjects"  design  is  used.  However,  when  a  "within  subjects"  design  is 
employed,  idiographic  analysis  (exploring  typical  response  patterns  of 
individual  subject  responses)  is  usually  preferred,  although  either  is 
appropriate.  Both  ways  of  looking  at  our  data  suggest  that  responses  are 
based  on  only  slightly  modified  diagnostic  information;  however,  only  the 
nomothetic  gives  an  idea  as  to  tte  specific  heuristic  strategy  subjects 
actually  employ.  Patterns  of  mean  responses  indicate  that  subjects'  log 
responses  are  quite  close  to  the  product  of  log  and  the  reliability 
probability.  That  is,  subjects  tend  to  use  the  reliability  probability 
to  adjust  the  log  diagnostic  odds  downward.  This  interpretation  can  be 
distinguished  from  the  alternative  heuristic  vdierein  subjects  adjust  the 
diagnostic  probability  or  the  diagnostic  likelihood  ratio  with  the  re¬ 
liability  probability  directly.  Because  these  heuristic  nx>dels  are  so 
highly  correlated  with  each  other  and  also  with  the  optimal  (modified  Bayes' 
theorem) ,  an  idiographic  analysis  cannot  distinguish  among  the  four  possi¬ 
bilities. 

One  of  the  greatest  sources  of  confusion  in  the  judgment  literature 
(especially  with  regard  to  base-rates)  is  the  difference  between  testing 
the  null  hypothesis  that  an  information  manipulation  (e.g.  reliability) 
had  no  effect,  versus  testing  the  optimal  model  hypothesis.  The  finding 
in  our  study  is  the  usial  one,  namely,  that  the  manipulation  (reliability) 
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scale  (rather  than  9  points) ,  and  weights  were  obtained  by  direct 
ratio  estimation  o£  "importance”.  Most  sessions,  both  analyst  and 
MAUD,  las ted, between  1  and  2  hours. 

Although  subjects  overwhelmingly  yielded  more  favorable  reports  for 
the  analyst  session  than  for  the  MAUD  session,  subjects'  agreanent  with  and 
acceptance  of  the  analyst  and  MAUD  results  (ijiplied  ordering  and  most  pre¬ 
ferred  alternative)  did  not  differ.  Specifically,  subjects  indicated  a 
desire  to  use  the  analyst  rather  than  MAUD  in  future  decisions  and  con¬ 
fidence  that  the  analyst  rather  than  MAUD  "found"  the  best  alternative 
by  a  ratio  of  8  to  1.  Most  subjects  thought  that  the  analyst  session  was 
more  helpful  (5  to  1) ,  more  comfortable  (4  to  1) ,  and  more  effective  in 
discovering  new  aspects  of  the  problem  (3  to  1)  than  the  MAUD  session. 

Yet,  only  a  slim  majority  (54%)  chose  the  analyst  ratings  over  the  MAUD 
ratings.  Furthermore,  only  small  differences  were  found  between  analyst 
and  MAUD  ratings  with  respect  to  the  number  of  order  reversals  (Kendall 
Tau)  with  direct  holistic  ratings  by  the  subject.  Likewise,  MAUD  and 
analyst  ratings  did  not  differ  substantially  in  the  number  of  times  they 
matched  the  most  preferred  alternatives.  In  short,  although  subjects 
reported  liking  analyst  sessions  better  than  MAUD  sessions,  no  differences 
emerged  in  subjects'  acceptance  of  resulting  evaluations.  These  results 
were  not  substantially  mediated  by  session  order,  problem  type,  analyst, 
or  subject  sex  and  race. 

Two  analysts  tended  to  differ  systematically  in  subjects'  approval  of 
their  sessions  and  subjects'  acceptance  of  the  final  ratings,  as  well  as 
in  the  number  of  attributes  generated.  The  median  number  of  attributes 
elicited  was  greater  for  analyst  sessions  (7.5)  than  for  MAUD  sessions 
(5.9);  however,  one  analyst  averaged  10  attributes  per  session,  vdiile  another 
averaged  only  a  little  over  5.  The  10-attribute  analyst  was  rated  higher 
than  the  other  four  analysts  in  terms  of  subjects'  inpressions  of  the  sessions, 
but  received  the  lowest  amount  of  acceptance  of  the  resulting  alternative 
orderings.  The  five -attribute  analyst,  however,  received  the  lowest  sub¬ 
jective  ratings  of  all,  but  achieved  the  greatest  degree  of  acceptance  of 
final  alternative  orderings.  Our  findings  seem  to  indicate  that  subjects 
feel  better  taken  care  of  when  more  attributes  are  included  in  the  analysis, 
but  that  subjects'  holistic  ratings  are  better  accounted  for  by  analyses 
with  smaller  rather  than  larger  numbers  of  attributes.  Of  course,  it  is 
easy  to  overinterpret  these  results,  and  replication  is  certainly  required. 
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is  used  in  some  heuristic,  non-optimal  manner.  Invariably,  the  very  same 
data  (ours  included)  that  could  be  used  to  reject  the  no  effect  hypothesis 
can  also  be  used  to  prove  that  the  subject  is  not  following  the  optimal 
model.  The  reason  is  simple:  subjects  made  some  use  of  the  manipulated 
information  (reliability),  but  not  qjtimal  use.  This  often  results  in 
seemingly  contradictory  findings:  reliability  information  had  an  effect 
on  responses  (no  effect  rejected),  yet  subjects'  responses  were  not  even 
ordinally  consistent  with  the  model,  due  to  a  neglect  of  reliability 
information.  Both  of  these  statements  are  true  of  our  study;  unfortunately, 
researchers  sometimes  choose  to  emphasize  no  effect  rejectieis  and  ignore 
optimal  model  rejections  (and  vice  versa),  in  order  to  support  a  particular  theo¬ 
retical  position.  It  makes  more  sense  to  look  at  the  data  both  ways,  and 
attempt  to  discover  the  exact  heuristic  employed,  if  possible. 
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Overall,  our  subjects  became  quite  involved  in  both  MAUD  and  analyst 
sessions.  Subjective  ratings  of  both  sessions  were  greatly  skewed  toward 
the  high  end.  Subjects  were  highly  motivated,  and  their  responses  seaned 
more  thoughtful  and  considered  than  is  often  the  case  for  thought  experiments 
with  hypothetical  scenarios,  typical  of  laboratory  experiments  with  college 
sub j  ects . 

Unfortunately,  it  is  somewhat  difficult  to  interpret  subjects'  acceptance 
of  the  resulting  ordering  of  an  analysis  within  this  paradigm.  Low  acceptance 
could  mean  that  the  analysis  has  totally  gone  awry,  or  it  could  be  indicative 
of  a  deeper,  more  valid  evaluation  than  the  subject  is  capable  of  in  his/her 
holistic  ratings.  Regardless  of  the  interpretation,  it  is  important  that 
analyst  and  MAUD  orderings  did  not  differ  in  terms  of  subject  acceptance. 

Of  course,  our  findings  cannot  be  interpreted  in  a  vacuum.  Proper 
consideration  should  be  given  to  the  subject  population,  problem  types, 
analyst  experience  and  method  (SMART) ,  and  the  particular  MAUA  software  we 
employed  (MAUD) .  In  particular,  we  should  comment  on  the  peculiarities  of 
the  MAUD  program.  We  found  that  MAUD  is  not  as  "stand  alone"  as  its  developers 
have  advertised.  In  particular,  our  subjects  needed  at  least  some  instruction 
in  the  attribute  elicitation  phase  of  the  program.  Typical  mistakes  included: 
repetition  of  attributes  (up  to  15  times) ;  including  more  than  one  attribute 
in  a  given  attribute  definitioa;  and  thinking  about  other  attributes  vdien 
specifying  the  "ideal  point"  and/or  scale  values  on  an  attribute.  MAUD 
should  give  the  subject  more  information  concerning  attribute  elicitation, 
as  the  "difference  questions"  are  simply  too  abstract  and  nondirective. 

We  alsp  found  that  very  few  subjects  are  able  to  answer  the  brlts  weighting 
question  properly.  In  particular,  most  exhibit  a  sort  of  risk  aversion,  in 
vhich  they  only  eqioate  the  sure  thing  to  the  gamble  \dien  the  gamble  odds 
favor  the  more  favorable  outcome.  Of  course,  this  response  strategy  will  render 
the  weights  almost  totally  meaningless,  since  they  will  be  solely  dependent 
upon  the  order  (essentially  random  in  MAUD)  in  which  the  two  varying  attributes 
are  presented.  In  short,  the  risk  aversion  problem  with  brlts  may  result 
in  random  weights  in  MAUD.  To  circumvent  this,  we  intervened  at  the  point 
vihen  the  subject  begins  the  brlts  portion  of  the  program,  and  attarpted  to 
explain  the  brlts  question  in  terras  of  "importance"  of  the  two  varying 
attributes.  In  particular,  indifference  between  the  sure  thing  and  the 
gamble  for  odds  of  1:1  was  eqviated  to  attributes  of  equal  inportance.  Odds 
of  greater  and  less  than  1:1  were  also  explained  in  terms  of  the  relative 
importance  of  the  attributes. 
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It  is  important  to  note  that  all  interventions  into  the  MAUD  sessions 
(both  for  attribute  elicitation  and  brlts  questions)  were  kept  short  and 
detached  from  the  flow  of  the  interaction  between  MAUD  and  the  subject. 
Information  and  instructions  were  given  only  for  clarifying  those  points 
necessary  for  the  MAUD  assessments.  In  short,  all  extra-MAUD  interaction 
within  the  MAUD  session  was  kept  as  unobtrusive  as  possible. 

Results  concerning  quality  of  attribute  sets  (in  terms  of  con5)leteness , 
value  independence,  etc.)  are  included  in  the  full  technical  report. 

III.  Hierarchical  vs.  nonhierarchical 
MAU  structures 

Ccmiplex  evaluation  problems  can  usually  be  aided  by  the  construction 
of  a  value  tree  which  organizes  general  values,  intermediate  objectives,  and 
final  value  relevant  attributes  in  a  hierarchy.  MAU  models  can  then  be 
built  in  two  ways: 

1)  By  ignoring  the  hierarchical  structure  and  performing  the 
weighting  and  rating  tasks  on  lowest  attributes  (twigs)  only; 

2)  By  weighting  branches  at  each  level  of  the  tree  under  a  given 
node  and  computing  final  attribute  weights  by  multiplying  down 
the  tree. 

The  hierarchical  weighting  model  can,  furthermore,'  be  cot^led  with 
ratings  of  options  on  different  levels  of  the  tree  to  examine  the  internal 
consistency  of  MAU  models  and  judgments  at  various  levels  of  aggregation. 

Hierarchical  weighting  is  especially  attractive  when  the  number  of 
attributes  is  very  large;  partly  because  it  reduces  the  number  of  necessary 
judgments,  and  partly  because  it  avoids  weighting  questions  in  vdiich  only 
remotely  related  attributes  need  to  be  compared.  But  there  are  also  problems; 
for  example,  respondents  may  add  to  an  iqjper  level  value  meaning  not 
captured  by  its  lower  branches;  ranges  of  alternatives,  vhich  can  often 
be  made  explicit  in  specific  attributes,  become  more  vague  at  higher  levels, 
perhaps  distorting  range  dependent  importance  weights;  furtheimore,  it  is 
not  clear  whether  nunbers  elicited  hierarchically  and  nonhierarchically 
are  consistant  --if  not,  which  nunbers  should  we  trust  more? 

To  answer  some  of  these  questions  Stillwell,  v.  Winterfeldt,  and 
Edwards  (Technical  Report  No.  81-2)  performed  an  experiment  in  which  37 
undergraduate  students  evaluated  three  scenarios  for  electricity  production 
(coal  vs .  nuclear  vs .  geothermal  coupled  with  strict  conservation  measures) . 
MAU  ratings  and  weights  were  elicited  for  13  attributes,  which  a  previous 
study  had  found  relevant  for  this  evaluation. 
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These  same  attributes  were  then  arranged  in  a  hierarchical  fashion,  and 
siibjects  judged  importance  on  each  of  three  levels  of  the  hierarchy. 

Upper  levels  are  simply  combinaticms  of  lower  level  attribute  sets.  Final 
weight  for  a  lower  level  attribute  (twig)  is  conputed  by  multiplying  weights 
that  include  that  twig  at  each  of  the  three  levels  of  the  hierarchy. 
Hierarchical  weights  were  found  to  be  more  variable  than  nonhierarchical 
weights;  this  finding  essentially  replicates  a  result  reported  by  Sayeki 
and  Vesper  in  1973. 

From  the  standpoint  of  structuring,  a  more  interesting  result  followed 
from  subjects'  attribute  ratings  of  alternatives  (location  measures)  at 
each  level  of  the  hierarchy.  Surprisingly,  subjects  are  often  inconsistent  in 
their  ratings.  For  example,  coal  may  be  rated  higher  than  nuclear  on  an 
upper  level  attribute,  vhile  nuclear  is  rated  higher  than  coal  on  all  of  the 
sub -attributes  making  up  that  attribute.  Three  explanations  seem  plausible. 
First,  subjects  may  siji5)ly  err  in  expressing  their  values.  A  second 
possibility  is  that  our  subjects'  values  are  extremely  labile;  between 
the  time  subjects  were  asked  to  rate  options  at  the  lower  and  higher  levels, 
their  values  changed.  The  third  explanation  is  that  subjects  imbue  higher 
level  attributes  with  a  richer  leaning  than  the  structure  below  that  attribute 
warrants,  creating  a  sort  of  "super  dimension"  of  undelineated  aspects. 

Further  research  is  warranted  on  this  topic,  as  a  case  can  be  made  for  all 
three  interpretations. 

Finally,  we  found  that  subjects'  weight  sets  did  not  differ  to  any 
great  extent  as  a  function  of  their  preferred  alternatives,  e.g.,  subjects 
favoring  the  nuclear  option  assigned  weights  which  were  similar  to  those 
preferring  coal.  All  groups  gave  the  highest  weight  to  the  health/safety/ 
environment  factor.  Instead  of  weights,  location  measures  were  the  primary 
determinant  of  preference  orderings  among  the  three  alternatives.  This 
suggests  that  preferences  are  not  determined  by  the  extent  to  vhich  one  is 
willing  to  trade  off  one  attribute  for  another,  as  has  of  ter  been  asserted. 
Contrarily,  it  appears  from  these  data  that  one's  perception  of  the  alter¬ 
natives'  standing  on  the  various  attributes  determines  preference  (or 
vice  versa).  Weights  seem  to  play  little  role.  Brickman,  Shaver,  and 
Archibald  (1969)  reported  much  the  same  finding  in  an  attitude  study  of 
various  foreign  policies  of  the  Uhited  States  towards  the  Vietnam  conflict. 

In  a  study  on  attitudes  towards  nuclear  power,  Otway,  Maurer,  and  Thomas 
(1978)  also  found  that  attitude  differences  between  pro-nuclear  groiq)s 
and  anti-nuclear  groups  are  largely  due  to  beliefs,  not  to  values.  This 
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result  is  not  encouraging  for  those  who  view  attribute  ratings  of  al¬ 
ternatives  as  a  task  for  "technicians"  who  should  know  the  "facts",  and 
iji5)ortance  weighting  as  the  primary  function  of  the  decision  maker  charged 
with  the  value  laden  task  of  trading  off  one  deserving  goal  for  another. 

Policy  decision  making,  at  least  in  arenas  as  highly  charged  as  nuclear 
power  and  Vietnam,  may  be  more  a  matter  of  care's  perception  of  how  well 
each  strategy  will  acconplish  the  stated  goals,  and  not  one  sensitive 
to  the  tradeoffs  among  different  goals. 

This  result,  coiqjled  with  the  finding  that  attribute  ratings  of 
alternatives  are  highly  inconsistent,  is  disconcerting.  In  effect,  this 
study  suggests  that  the  structure  and  labels  of  attributes  may,  to  a  great 
extent,  determine  the  final  preference  ordering  of  the  analysis. 

TV.  Group  structure^  datum  diagnosticity, 
and  source  reliability  in  hierarchical  inference 

Many  inportant  and  interesting  probabilistic  information  processing 
tasks  are  essentially  hierarchical  or  cascaded  in  formal  structure,  and 
involve  sitioations  in  \diich  different  people  have  different  types  of  prob¬ 
abilistic  information  which  must  be  combined.  Griffin  and  Edwards  (for 
a  more  complete  description  see  Technical  Report  No.  81-3)  studied  a  simple 
cascaded  inference  task  in  vAiich  one  individual  has  information  about  the 
diagnosticity  of  an  event  and  the  other  has  information  about  the  probability 
that  the  event  has  occurred  (reliability  information) .  The  main  purpose  of 
this  experiment  was  to  compare  two  people  working  together  \dien  combining 
such  information  with  an  individual  with  both  types  of  information,  and  to 
assess  the  relative  impact  of  reliability  and  diagnosticity  in  both  situations. 

In  the  symmetric,  two  hypothesis  case,  reliability  and  diagnosticity 
should  be  equally  important  in  determining  the  aggregate  odds: 

L  =  (L^  +  1)/(L^  +  Lj)  ,  (1) 

idiere  L  is  the  aggregate  odds,  is  the  reliability  likelihood  ratio,  and 
is  the  diagnosticity  likelilwod  ratio.  Thus,  diagnostic  information  at 
odds  of  10  to  1,  that  has  only  a  2  to  1  chance  in  favor  of  being  true,  should 
be  equally  as  convincing  as  diagnostic  information  at  odds  of  2  to  1,  that 
has  a  10  to  1  chance  in  favor  of  being  true. 

IMdergraduates  (31  two-person  groups  and  10  individuals  working  alone)  were 
presented  with  a  scenario  in  vdiich  a  judgment  had  to  be  made  about  the 
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For  the  most  part,  the  state  of  the  art  in  decision  software  is  at  a  level 
of  data  storage,  display,  and  confutation  as  an  aid  to  a  sonhisticated  user. 
Almost  certainly,  the  next  generation  of  decision  softivare  will  be  designed  to 
perfoim  a  larger  range  of  analyst  functions.  We  have  focused  on  identifying 
potential  problems  challenging  the  computerization  of  decision  analysis,  anci 
on  assessing  the  extent  to  vdiich  these  problems  can  be  overcome.  TWo  questions 
are  particularly  salient:  First,  to  vdiat  extent  can  the  often  ill-defined  art 
of  structuring  be  transformed  into  software;  and  secondly,  to  what  extent  is 
past  consimers’  satisfaction  with  decision  analysis  a  fimction  of  the  foimal 
methods  and  procedures  of  the  theory  and  rationale  of  decision  theory,  and  to 
what  degree  do  other  factors  such  as  personal  interaction  and  the  est^lishment 
of  a  rapport  accxjunt  for  client  approval?  We  canpared  multi  attribute  utility 
analyses  of  personal  decision  problans  of  undergraduates  performed  by  a  hunan 
analyst  vs.  those  performed  by  a  "stand-alone”  software  package,  Multi  Attribute 
Utility  Decatfosition  OIAUD).  Although  s\i>jects  overwhelmingly  yielded  more 
favorable  reports  for  the  analyst  session  than  for  the  MAUD  session,  subjects' 
agreement  with  §  acceptance  of  the  analyst  and  MAUD  results  (iJ^iplied  ordering 
and  most  preferred  alternative)  did  not  differ.  We  did  find  that  subjects 
feel  better  taken  care  of  when  more  attributes  are  included  in  the  malysis, 
but  that  subjects'  holdstic  ratings  are  better  accounted  for  by  analyses  with 
smaller  rather  than  larger  nunbers  of  attributes.  We  found  that  MAUD  is  not 
as  "stand-alone"  as  its  developers  have  advertised.  In  particular,  our  si±>jects 
needed  at  least  seme  instruction  in  the  attribute  elicitation  phase  of  the  pro¬ 
gram.  We  also  found  that  most  si±)jects  are  unable  to  answer  the  brlts  weighting 
question  properly;  uninstructed  responses  exhibit  a  sort  of  risk  aversion  that 
renders  the  weights  virtually  meaningless.  Overall,  subjects  were  hij^ly  moti¬ 
vated,  and-their  responses  seemed  more  thoughtful  and  considered  than  is  often 
the  case  for  thought  experiments  with  hypothetical  scenarios ,  typical  of  labo¬ 
ratory  experiments  With  college  svibjects. 

Hierarchical  MAU  structurii^  (and  weighting)  is  a  particularly  attractive 
approach  to  building  value  trees  when  the  number  of  -attributes  is  large;  partly 
because  it  reduces  the  nvmber  of  necessary  jxidgnents,  and  partly  because  it 
avoids  weighting  questions  in  vhich  only  remotely  related  attributes  need  to 
be  compared.  But  there  are  several  potential  problems:  for  example,  respond¬ 
ents  may  add  to  an  iqiper  level  value  meaning  not  captured  by  its  lower  branches; 
ranges  of  alternatives ,  which  can  often  be  made  explicit  in  specific  attributes , 
become  more  vague  at  higher  levels,  perhaps  distorting  range  dependent  impor¬ 
tance  weights;  furthermore,  it  is  not  clear  whether  nunbers  elciited  hierarch¬ 
ically  and  non-hierarchically  are  cons is tent- -if  not,  which  should  we  trust 
more?  We  studied  subjects'  weighting  and  rating  judgments  for  both  hierarchical 
and  ncn-hierarchical  value  trees  relevant  to  evalxration  of  alternatives  for 
electricity  production.  Hierarchical  weights  were  found  to  be  more  variable 
than  non-hierarchical  weights,  essentially  replicating  a  result  reported  in 
1973.  We  also  found  that  subjects  are  often  inconsistent  in  their  attribute 
ratings  across  different  levels  of  the  value  hierarchy.  Random  error,  value 
lability,  and  misunderstanding  of  higher  level  attributes  are  all  possible 
explanations  for  this  result.  Finally,  we  found  that  subjects'  weight  sets 
did  not  differ  to  any  great  extent  as  a  function  of  their  preferred  alternatives 
rather,  location  measures  were  the  primary  determinant  of  preference  orderings . 
Policy  decision  making,  at  least  in  some  highly  charged  arenas,  may  be  more  a 
matter  of  one's  perception  of  how  well  each  strategy  will  accomplish  the  stated 
goals,  and  not  one  sensitive  to  the  tradeoffs  among  different  goals. 

Many  important  and  interesting  probabilistic  infomation  processing  tasks 
are  essentially  hierarchical  or  cascaded  in  fomal  structure,  and  involve  situ¬ 
ations  in  whidi  different  people  have  different  types  of  probabilistic 
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likelihood  that  a  job  applicant  will  be  successful,  assuning  that  he  or 
she  is  hired.  The  diagnosticity  information  was  a  test  result  with 
known  validity;  the  reliability  information  was  the  odds  that  an 
unreliable  tester  actually  reported  the  true  versus  a  random  test  result. 
Siibjects  were  asked  to  give  odds  based  on  12  different  L^,  pairs;  they 
were  told  that  a  monetary  payoff  of  up  to  $5.00  would  be  given  contingent 
on  their  performance.  Both  nomothetic  and  idiographic  analyses  of  the 
responses  indicated  that  diagnosticity  has  a  greater  impact  than  reli¬ 
ability  on  judged  responses.  This  effect  was  not  mediated  by  subject 
sex,  nor  by  whether  the  information  was  integrated  by  an  individual  working 
alone  or  by  two  people,  each  given  either  diagnosticity  or  reliablity. 
Furthermore,  naming  either  the  "reliability"  person  or  the  "diagnosticity" 
person  as  "responsible"  for  the  group  response  did  not  alter  the  findings 
appreciably. 

Substantively,  this  experiment  has  replicated  a  robust  finding  in  the 
subjective  inference/prediction  literature.  Kahneman  and  Tversky  (1973, 
1979)  and  others  have  demonstrated  that  subjects  predicting  a  criterion 


nosticity  information  in  the  fom  of  contingency  tables  with  appropriate 
summary  statistics.  (The  cover  story  involved  an  unreliable  report  of 
pass/fail  on  a  test  that  was  somewhat  diagnostic  of  success/failure  in 
a  job  situation.)  Past  research  on  the  base-rate  fallacy  has  shown  that 
priors  will  be  utilized  more  vdien  they  are  either  causal  to  the  event 
hypotheses  (Bar-Hillel,  1980;  Tversky  and  Kahneman,  1980)  or  imparted 
to  the  subject  in  a  concrete,  trial  by  trial  manner  (Manis,  Dovalina, 

Avis,  and  Cordoze,  1980)  (See  also  Kassin,  1979).  We  can  only  speculate 
that  such  manipulations  would  have  had  similar  affects  on  the  utilization  of 
reliability  information  in  our  experiment.  Although  contingency  tables 
are  more  concrete  than  a  single  summary  number,  they  are  still  relatively 
abstract  (c.f.,  Manis,  et  al.,  1980). 

Methodologically,  this  experiment  points  out  two  important  contrasts  in 
the  way  judgment  and  decision  researchers  view  their  data:  (1)  nomothetic 
vs.  idiographic  analysis  and  (2)  no  effect  vs. optimal  model  null  hypothesis 
testing.  (See  Einhom  and  Hogarth,  1981).  Nomothetic  analyses  (exploring 
response  patterns  of  group  mean  responses)  are  required  when  a  "between 
subjects"  design  is  used.  However,  when  a  "within  subjects"  design  is 
employed,  idiographic  analysis  (exploring  typical  response  patterns  of 
individual  subject  responses)  is  usually  preferred,  although  either  is 
appropriate.  Both  ways  of  looking  at  our  data  suggest  that  responses  are 
based  on  only  slightly  modified  diagnostic  information;  however,  only  the 
nomothetic  gives  an  idea  as  to  tte  specific  heuristic  strategy  subjects 
actually  employ.  Patterns  of  mean  responses  indicate  that  subjects'  log 
responses  are  quite  close  to  the  product  of  log  and  the  reliability 
probability.  That  is,  subjects  tend  to  use  the  reliability  probability 
to  adjust  the  log  diagnostic  odds  downward.  This  interpretation  can  be 
distinguished  from  the  alternative  heuristic  vdierein  subjects  adjust  the 
diagnostic  probability  or  the  diagnostic  likelihood  ratio  with  the  re¬ 
liability  probability  directly.  Because  these  heuristic  nx>dels  are  so 
highly  correlated  with  each  other  and  also  with  the  optimal  (modified  Bayes' 
theorem) ,  an  idiographic  analysis  cannot  distinguish  among  the  four  possi¬ 
bilities. 

One  of  the  greatest  sources  of  confusion  in  the  judgment  literature 
(especially  with  regard  to  base-rates)  is  the  difference  between  testing 
the  null  hypothesis  that  an  information  manipulation  (e.g.  reliability) 
had  no  effect,  versus  testing  the  optimal  model  hypothesis.  The  finding 
in  our  study  is  the  usial  one,  namely,  that  the  manipulation  (reliability) 
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is  used  in  some  heuristic,  non-optimal  manner.  Invariably,  the  very  same 
data  (ours  included)  that  could  be  used  to  reject  the  no  effect  hypothesis 
can  also  be  used  to  prove  that  the  subject  is  not  following  the  optimal 
model.  The  reason  is  simple:  subjects  made  some  use  of  the  manipulated 
information  (reliability),  but  not  qjtimal  use.  This  often  results  in 
seemingly  contradictory  findings:  reliability  information  had  an  effect 
on  responses  (no  effect  rejected),  yet  subjects'  responses  were  not  even 
ordinally  consistent  with  the  model,  due  to  a  neglect  of  reliability 
information.  Both  of  these  statements  are  true  of  our  study;  unfortunately, 
researchers  sometimes  choose  to  emphasize  no  effect  rejectieis  and  ignore 
optimal  model  rejections  (and  vice  versa),  in  order  to  support  a  particular  theo¬ 
retical  position.  It  makes  more  sense  to  look  at  the  data  both  ways,  and 
attempt  to  discover  the  exact  heuristic  employed,  if  possible. 
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For  the  most  part,  the  state  of  the  art  in  decision  software  is  at  a  level 
of  data  storage,  display,  and  confutation  as  an  aid  to  a  sonhisticated  user. 
Almost  certainly,  the  next  generation  of  decision  softivare  will  be  designed  to 
perfoim  a  larger  range  of  analyst  functions.  We  have  focused  on  identifying 
potential  problems  challenging  the  computerization  of  decision  analysis,  anci 
on  assessing  the  extent  to  vdiich  these  problems  can  be  overcome.  TWo  questions 
are  particularly  salient:  First,  to  vdiat  extent  can  the  often  ill-defined  art 
of  structuring  be  transformed  into  software;  and  secondly,  to  what  extent  is 
past  consimers’  satisfaction  with  decision  analysis  a  fimction  of  the  foimal 
methods  and  procedures  of  the  theory  and  rationale  of  decision  theory,  and  to 
what  degree  do  other  factors  such  as  personal  interaction  and  the  est^lishment 
of  a  rapport  accxjunt  for  client  approval?  We  canpared  multi  attribute  utility 
analyses  of  personal  decision  problans  of  undergraduates  performed  by  a  hunan 
analyst  vs.  those  performed  by  a  "stand-alone”  software  package,  Multi  Attribute 
Utility  Decatfosition  OIAUD).  Although  s\i>jects  overwhelmingly  yielded  more 
favorable  reports  for  the  analyst  session  than  for  the  MAUD  session,  subjects' 
agreement  with  §  acceptance  of  the  analyst  and  MAUD  results  (iJ^iplied  ordering 
and  most  preferred  alternative)  did  not  differ.  We  did  find  that  subjects 
feel  better  taken  care  of  when  more  attributes  are  included  in  the  malysis, 
but  that  subjects'  holdstic  ratings  are  better  accounted  for  by  analyses  with 
smaller  rather  than  larger  nunbers  of  attributes.  We  found  that  MAUD  is  not 
as  "stand-alone"  as  its  developers  have  advertised.  In  particular,  our  si±>jects 
needed  at  least  seme  instruction  in  the  attribute  elicitation  phase  of  the  pro¬ 
gram.  We  also  found  that  most  si±)jects  are  unable  to  answer  the  brlts  weighting 
question  properly;  uninstructed  responses  exhibit  a  sort  of  risk  aversion  that 
renders  the  weights  virtually  meaningless.  Overall,  subjects  were  hij^ly  moti¬ 
vated,  and-their  responses  seemed  more  thoughtful  and  considered  than  is  often 
the  case  for  thought  experiments  with  hypothetical  scenarios ,  typical  of  labo¬ 
ratory  experiments  With  college  svibjects. 

Hierarchical  MAU  structurii^  (and  weighting)  is  a  particularly  attractive 
approach  to  building  value  trees  when  the  number  of  -attributes  is  large;  partly 
because  it  reduces  the  nvmber  of  necessary  jxidgnents,  and  partly  because  it 
avoids  weighting  questions  in  vhich  only  remotely  related  attributes  need  to 
be  compared.  But  there  are  several  potential  problems:  for  example,  respond¬ 
ents  may  add  to  an  iqiper  level  value  meaning  not  captured  by  its  lower  branches; 
ranges  of  alternatives ,  which  can  often  be  made  explicit  in  specific  attributes , 
become  more  vague  at  higher  levels,  perhaps  distorting  range  dependent  impor¬ 
tance  weights;  furthermore,  it  is  not  clear  whether  nunbers  elciited  hierarch¬ 
ically  and  non-hierarchically  are  cons is tent- -if  not,  which  should  we  trust 
more?  We  studied  subjects'  weighting  and  rating  judgments  for  both  hierarchical 
and  ncn-hierarchical  value  trees  relevant  to  evalxration  of  alternatives  for 
electricity  production.  Hierarchical  weights  were  found  to  be  more  variable 
than  non-hierarchical  weights,  essentially  replicating  a  result  reported  in 
1973.  We  also  found  that  subjects  are  often  inconsistent  in  their  attribute 
ratings  across  different  levels  of  the  value  hierarchy.  Random  error,  value 
lability,  and  misunderstanding  of  higher  level  attributes  are  all  possible 
explanations  for  this  result.  Finally,  we  found  that  subjects'  weight  sets 
did  not  differ  to  any  great  extent  as  a  function  of  their  preferred  alternatives 
rather,  location  measures  were  the  primary  determinant  of  preference  orderings . 
Policy  decision  making,  at  least  in  some  highly  charged  arenas,  may  be  more  a 
matter  of  one's  perception  of  how  well  each  strategy  will  accomplish  the  stated 
goals,  and  not  one  sensitive  to  the  tradeoffs  among  different  goals. 

Many  important  and  interesting  probabilistic  infomation  processing  tasks 
are  essentially  hierarchical  or  cascaded  in  fomal  structure,  and  involve  situ¬ 
ations  in  whidi  different  people  have  different  types  of  probabilistic 
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information  which  must  be  combined.  We  studied  a  sirole  cascaded  inference 
task  in  ^ich  one  individual  has  information  about  the  diagnosticity  of  an 
event  and  the  other  has  information  about  the  probability  that  the  event  has 
occurred  (reliability  information) .  Our  main  pinnpose  was  to  conpare  two 
people  working  together  with  an  individual  corbining  both  types  of  information, . 
and  to  assess  the  relative  iranact  of  reliability  and  diagnosticity.  Both 
nomothetic  and  idiographic  analyses  of  the  responses  indicated  that  diagnosticit / 
has  a  greate’r  impact  than  reliability  on  judged  responses.  This  effect  was  not 
mediated  by  subject  sex,  nor  by  whether  the  information  was  integrated  by  an 
individual -working  alone  or  by  two  people,  each  given  either  diagnosticity  or 
reliability.  Naming  either  the  "reliability"  person  or  the  "diagnosticity* 
person  as  responsible  for  the  grotro  response  did  not  alter  the  findings  appreci¬ 
ably.  Our  results  are  consistent  with  ti[\e  bulk  of  the  subjective  inference/ 
prediction  literature  that  suggests  that  subjects  over- emphasize  the  impact  of 
diagnostic  information  by  not  taking  full  account  of  other  relevant  information, 
such  as  imperfect  correlation  in  prediction  problans,  and  base- rate  information 
in  Bayesian  inference. 
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